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ABSTRACT 

A meta-analytical literature review was performed on 
the literature in which computerized and paper based information 
retrieval systems were compared. Specifically, online catalogs were 
compared with card catalogs, and online bibliographic retrieval was 
compared with searching printed indexes- Studies which included 
information on relevance, precision, time, or costs of searching were 
selected. A total of 25 studies published between 1967 and 1989 met 
the selection criteria, producing a total mean effect size of -0.383. 
The analysis revealed that there were no significant differences 
between the two systems for the varieU^les of relevance, time, or 
costs. The paper based systems were significantly superior on the 
precision variable. The variance in individual study results could 
not be explained by any of the factors that were included in the 
analysis. These factors included the publication date, publication 
mode, method of effect size compvtation, library environment, and 
search complexity. It is hypothesized that the variability in study 
methodology might explain the variability in study results. Specific 
recommendations are made for more standardized methods in future 
research in which information retrieval systems are compared. The 
individual study results are appended. (54 references) 
(Author/MAB) 
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Abstract 

A meta-analytical literature review was performed on the 
literature in which computerized eind paper based information 
retrieval systems were compared. Studies which included 
information on rel ^ance, precision, time or costs of searching 
were selected. A total of 25 studiet published between 1967 and 
1989 met the selection criteria. A total mean effect size of 
-0.383 was produced. The analysis revealed that there were no 
significant differences between the two systems for the variables 
of relevance, time or costs. The paper based systems were 
significantly superior on the precision variable. The variance 
in individual study results could not be explained by any of the 
factors thdtt were included in the analysis. These factors 
included the publication date, publication mode, method of effect 
size computation, library environment and search complexity. It 
is hypothesized that the variability in study methodology may 
explain the variability in study results. Specific 
recommendations are made for more standardized methods in future 
research in which information retrieval systems are compared. 
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The computerization of library services, specifically of 
information retrieval, has been the most discussed and debated 
topic within library circles in recent years. Since the 
inception of computerized indexes in the early 1960s, countless 
books and journal articles have been published which discuss the 
advantages and disadvantages of automated information retrieval. 
Indeed, many new journals have been introduced, which have as 
their expressed purpose the reporting of information on this 
particular aspect of automation. 

Though some authors state that librarians universally 
consider this automation an improvement (Lipow, 1989; Miller, 
1989), others are just as vehement in opposing the new systems 
(Kusack, 1988a, 1998b). Those who favor the automation of 
information retrieval point to the many advantages which 
computerized systems should be able to offer. Librarians in this 
camp claim that automation will lead to improved productivity and 
error control; and also to increased speed, range and depth of 
service. More specifically, online systems are expected to 
increase the depth of indexing, provide multiple access points, 
allow for better updating and allow librarians to use materials 
that are not physically present. Those who favor traditional 
paper based systems point to the portability, the browsing 
capabilities, and the freedom from equipment that these systems 
offer (Lancaster, 1977, 1982; Lipow 1989). Unfortunately, 
however, most librarians are not basing their opinions on any 
real advantage offered by either system. There is evidence that 
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the positions taken by most librarians on this topic are based 
more on irrational opinions than on hard evidence (Kusack, 1988a, 
1988b) . 

Whatever the opinions expressed by librarians, many 
libraries have already automated their information retrieval 
systems. Seventy four percent of academic libraries now have 
online catalogs (Epple & Ginder 1987), and online database 
searching has become commonplace (Medow 1988). These innovations 
have a direct impact on library users. Library researchers have 
been quick to administer surveys that attempt to assess users' 
opinions of these new technologies. These surveys seem to sf ow 
that users overwhelmingly support the automation of information 
retrieval (Kranich, Spellman & Hecht, 1984; Lipow, 1989; Moore, 
1984). Most of these researchers report that approximately 757. 
of those surveyed prefer the automated systems (California 
University, 1983; Ferguson, 1982; Lawrence, 1982; Markey, 1983), 
although results as high as 947. (Dowlin, 1980) and as low as 687. 
(Shuman, 1983), 647. (Pease & Gouke, 1982) and 167. (Edmonds, Moore 
4 Balcome, 1989) have also been reported. Although these results 
appear to be conclusive, the methodologies used are not beyond 
reproach. Most are highly vulnerable to the hawthorne effect. 
Also, library patrons' expressed opinions may not be an accurate 
measure of their true satisfaction. Kranich Spellman and Hecht 
(1984) provide some insight into this problem. The results of 
their study showed that 637. of the subjects using the card 
catalog found the material that they were seeking, while only 357. 
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of the online group subjects successfully completed their 
searches. Yet, these researchers also reported that 737. of the 
subjects preferred the online catalog. These results are 
obviously inconsistent with one another. 

Although the purposes of the viuthors of all of these 
research papers and opinion .articles have been to clarify the 
issues involved in the computerization of information retrieval 
in libraries, the real result has often been a clouding of the 
issues. So much has been written on this topic, and so much 
conflicting evidence has been reported, that the sum effect is 
confusion rather than clarity. A further problem for researchers 
in this area has been the economic realities of library 
operations. If automated information retrieval systems were 
implemented as additions to library services there could be no 
argument that they provide for increased capabilities over 
traditional systems alone. However, for economic reasons, 
automated systems often replace rather than augment existing 
systems (Epple & Binder, 1987; Lancaster, 1982). 

There is, therefore^ a need for studies which experimentally 
compare the merits of these two systems; and also for reviews 
that provide digests of all that has been written on the 
automation of information retrieval. This research project is 
designed to provide librarians with answers to these two needs. 
These goals will be accomplished through a statistical review of 
the experimental evidence that has been reported in the library 
literature which directly compares automated and traditional 



6 



information retrieval systems in libraries. This will be 
accomplished through a methodology known as meta-analysis. 

Hethods 

Description of Meta-analysis 

The methods used in meta-analytic research developed slowly 
during the i95C"5 and i960s, as researchers, sought to cope with 
the vast quantity of experimental data that was available for 
many research questions. These methods were first described as a 
distinct research methodology, and were first called meta- 
analysis, by Gene Glass in 1976. Other researchers, most notably 
Robert Rosenthal, have proposed alternate methods for integrating 
ntudy findings (Bangert-Drowns, 1984; Glass, McGraw & Smith, 
1981; Rosenthal, 1984). This project will use the principles 
described by Glass, with some modifications, as advocated by 
Bangert-Drowns . 

Glass considers meta-analysis to be the incorporation of 
scien'».ific and stat .ical methods into the practice of reviewing 
the literature. Meta-analytic literature reviews should be held 
to the same standards as primary research. This means that 
methodologies should be clearly described, results should be 
statistically analyzed and results should be replicable. The 
advantages of this type of research are many. The results are 
not dependent on the bias of the reviewer, and often a robust 
overall result can be obtained by combining the results of many 
inconclusive studies. Also, meta-analysis provides an 
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opportunity for evaluating the me**hods used in the primary 
research that is d^nalyzed. Disadvantages, or criticisms, of 
meta-analysis include: that it lumps together data from studies 
done in different environments, that datt^ from low quality 
studies is used, and that only published data is avciilable for 
integration. Glass, McGraw and Smith (1981) provide rebuttals to 
each of these criticisms, and the methods used in this study will 
be designed to reduce the impact of these problems. 
Data Collection 

Studies which compared online information retrieval with 
paper based retrieval were located by searching ERIC, LISA, ISA, 
NTIS, Library Literature, Dis&ertaticxi Abstracts, and the Online 
Information Retrieval Annual Bibliography, which is published in 
Online Review. To reduce the variability in the research 
environments, the study pool was limited to those works which 
compared online catalogs with card catalor|s, or online 
bibliographic retrieval with searching printed indexes. Each 
study was required to report ir^formation on at least one of the 
following dependent variables^ recall of relevant material, 
precision of the recalled set, the time necessary to identify 
each relevant hit, and the costs involved in identifying each 
relevant hit. It was necessary to reject many studies for 
reporting insufficient data. In order to be included, studies 
had to report one of the following forms of numerical resultsj 
means and standard deviations; recall or precision ratios, with 
the total number of relevants also reported; F, Jt, or chi square 
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statistics; or data from which any of these could be computed or 
estimated. Each study was al^i^o analyzed for its geographic 
location, year and mode of publication, library type^ and search 
complexity. For the pu/'-poses of this analysis searching was 
defined as simple or complex depending upon whether boolean logic 
was used. 
Data Analysis 

For each study an effect size (es) was computed. Glass' 
definition of effect size is given in equation 1. 

es= Me - Mc (1) 
SO 

Me in this equation is the mean of the experimental group, in 
this case the online group, Mc is the mean of the control or 
manual groap, and SD is the standard deviation of the control 
group. Thus studies which show online searching to be superior 
will have positive effv^^ct sizes) and studies with results that 
favor manual systems will have negative effect sizes. The effect 
sizes for studies which reported means and standard deviations 
were computed according to this equation. It should be noted, 
however, that on the variables of time and cost it was necessary 
to change the sign of the result in order to keep the convention 
of having positive effect sizes for studies that favor 
automation. This was necessary since smaller means are superior 
for these variables. The effect sizes of studies that supplied 



other statistics were computed using other formulas provided by 
Glass, McGraw and Smith (1981). Unfortunately, the effect sizes 
for studies which reported chi squares, recall ratios or 
precision ratios, could only be estimated, since thes^e are non 
parametric statistics. Glass also provides formulas to perform 
these estimates • 

An effect size was computed for each dependent variable of 
each study. These dependent variables include recall, precision, 
time per relevant citation and cost per relevant citation. An 
overall effect size was then computed for each study. Mean 
effect sizes (ES) were computed •'or each of the dependent 
variables, and for the overall study effect sizes. These ETS were 
tested for statistical significance with t tests. In audition, 
the results were analyzed on the basis of the publication date, 
publication mode, library type, search complexity, and means of 
computing es. These analyses were made to explain the 
variability in individual study findings. 

Results 

Twenty five studies were located which met the selection 
criteria. The publicat on dates ranged from 1967 to 1989, with a 
mean of 1980. Eighteen of these studies were published in 
journals, three as ERIC documents, two as parts of conference 
proceedings, one as a research report, and one as a dissertation. 
A few of the studies were national in scope; nine states and four 
foreign countries were also represented. The foreign countries 
included the United Kingdom, Sweden, the Netherlands and Japan. 
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All sections of the United States were represented, including the 
northeast, southern, midwest, mountain and west coast states. 
For thirteen of the research projects the environment was an 
academic library. Eleven took place in special libraries and one 
in a public library. For ten of the studies the effect size was 
computed, for the remaining fifteen it was estimated. The search 
complexity could be defined as complex in thirteen studies and as 
simple in six. One study reported data for both simple and 
complex searches. For the remaining studies the search 
complexity could not be determined. All of the studies reported 
data on the recall of relevant material, eight on precision, 
eleven on time and seven on costs. The overall effect size most 
favorable to online searching was *^4.750; that most favorable to 
traditional :^ystems was -2.831. Negative effect sizes were 
obtained from nineteen of the papersj positive effect sizes frcm 
six. The data obtained from each study are reported in the 
appendix . 

The mean effect size for each of these variables was 
computed. These results werei overall ES -0.383 with a standard 
deviation of 1.33j recall ES -0.503, standard deviation 1.47j 
precision ES -1.197, standard deviation 0.83; time ES -♦•0.815, 
standard deviation 1.35| and cost ES -0.171, with 1.59 as the 
standard deviation. Statistical significance was ootained only 
on the precision variable. These results are summarized in table 
1. 
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Table I 

Statistical Ana l ysis of Ef fect S^^es 

Variable N SD ES Statistic Sianif icancg^ 



Total 25 1.35 -0.383 t=i.41 no 

Recall 25 1.47 -0.503 t«1.71 no 

Precision 8 0.83 -1.197 t=4.0*>' p>.01 

Time/hit 11 1.35 +0.815 t=i.99 no 

Cost/hit 7 1.59 -0.171 t==0.28 no 



Further analyses was done on the total aiaan affect size. 
The purpose of this analysis was to explain the variability in 
the individual studies' effect sizf9s. This analysis did not 
produce any statistically significant results. These results are 
summarized in table 2. 
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Table 2 

Analysis of Variance Among Studies 
Variable y SD ES 



Statistic Significance 



Year of Publication 
overall 25 

Computation of es 
computed 10 
estimated IS 

Publication type 

journal 18 
other 7 



Library type 
academic 
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special 11 
Search Complex 7, ty 
complex 14 
simple 7 



1.35 

0.69 
1.65 

1.47 
0.93 

0,76 
1.68 

1.53 
0.88 



-0.383 

-0.305 
-0.433 

-0.259 
-0.701 

-0.462 
-0.066 

+0.051 
-1.211 



r«=0.05 



t«0.22 



1=0.71 



t»0 . 73 



t=1.92 



no 



no 



no 



no 



no 



Discussion 
Analysis of Average Effe ct Sizes 

Although most of these results do not show statistic#».l 
significance, there is a strong practical significance. These 
results clearly indicate that computerized information retrieval 
systems do not, when taken collectively, offer any advantage over 
the traditional paper based systems. On the other hand, paper 
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based systems also have no clear overall advantage. These 
results have three important implications for libraries. The 
first is that libraries which are contemplating the automation of 
their public service information retrieval systems should not 
cite the improvement of information retrieval as a reason for 
pursuing automation. The automated systems analyzed in this 
review were superior to the manual systems only on the variable 
of time^ and this superiority was not statistically :^ignif icant . 
This is not meant to suggest that libraries should abandon 
computers. Automation has many advantages. There is little 
doubt that library technical services have become more efficient, 
and that cooperation among libraries has been enhanced. The 
overwhelmingly favorable reaction of library users to automation, 
as mentioned in the introduction, is also undeniable. Librarians 
should be aware, howevev", that the improvement of information 
retrieval in library public services is not one of the advantages 
offered by automation. 

A corollary of this finding is that libraries which are 
striving to improve their public services should not see either 
automation or de-automation as the answer. Many of these studies 
reported t^'^t there was little overlap in the citations produced 
by the two systems. This indicates that both systems together 
may be the best alternative. This conclusion is supported by the 
research of tiaciuszko (1987) and Caren and Somerville (1986). 
Maciuszko, who was not impressed with the rG>triL*val of either 
system, states that "to abandon totally on^ system in favor of 
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the other may prove a r'iQservice to the researcher** (Maciuszko, 
1987 p309) . 

The second main implication of these results is that online 
systems have not met their potential. This conclusion has been 
reached by many other authors (Kusack, 198Bb; Lipow, 19B9) • 
Again I the implication is not that we should abandon automated 
information retrieval, but that many enhancements will be 
necessary before its promise is fulfilled. 

The final implication concerns the cost variable. Online 
searching is often claimed to be exceedingly expensive. Breen 
(1987) states that free online searching is economically 
unfeasible for most academic librariesj and reports that 737. of 
these libraries charge for online searches. This meta-analysis, 
however, does not support the assertion that online searching is 
significantly more expensive than paper based searching. East 
(1980), in an earlier review, reached the same conclusion. East 
also, however, provides a reason for the lack of significance! 
most studies are actually only crude comparisons, and don't take 
all aspects of the costs into account. This author has reached a 
similar conclusion. This conclusion was supported by Cohen and 
Young (1986). These researchers performed the only complete 
analysis of costs that was analyzed for this review. They found 
that print was cheaper for all databases analyzed, at both one 
year and five years of use. 
Analysis of Vi^riance ftmono Studies 

The lack of significance in the analysis of the Viiriance 
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among the individual study findings presents more of a problem. 
Analysis v^as made on the basis of year because of the often 
repeated assumptions that online systems are improving p and that, 
as computers become more commonplace^ users will become more 
comfortable with their preser^ce in the library. The relationship 
between publication year and study result, however, was virtually 
2^ro. Further evidence that users are not getting better at 
online searching is provided by Edmonds Moore and Balcome (1989). 
The subjects in this recent study were 10 to 15 year-olds. They 
were members of the so called computer generation. Yet, the 
effect size of this study was the most negative of all of those 
that were analyzed. This was also the only study among those 
analyzed in which the subjects preferred the card catalog. These 
results clearly indicate that user success with online systems 
has improved little since the inception of these systems 25 years 
ago. 

The data was analyzed on the variables of publication type, 
and method of determining effect sizes, to check for a 
publication bias$ and to in<:>ure that the estimations of effect 
sizes were accurate. Analysis on these variables is suggested by 
Glass McGraw and Smith (1981). Statistical insignificance on 
these variables is desireable, since significance would indicate 
that there was a problem with this research. The rei^ults on 
these variables » found in table 2, show that there is no apparent 
bias in the published literature on this subject. There was also 
no problem with estimating the effect sizes for studies that 
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reported insufficient data for computation. The average effect 
sizes for results that were computed and for results that were 
estimated were virtually identical. 

Statistical significance was expect*^d on the variable of 
s^. h complexity. Research results reported by Havener (198S), 
and by Smille, Nugent, Sander and Johnson (1989), ind-.cated that 
there was a difference in retrieval performance between simple 
and complex searches. Both of these papers reported data for 
both simple and complex searches; and both found that online 
systems were more advantageous for complex searches. Significant 
results, however, were not achieved on this variable in this 
meta-analysis, though this variable did produce a difference 
greater than that of any other variable. The average effect size 
for complex searches was +0.051? for simple searches it was 
-1.211. Significance may have been precluded on this variable 
because it was impossible to determine the degree of search 
complexity, and because complexity could not be determined at all 
for 3 of the studies. 

The last variable on which variance was analyzed was library 
type. Analysis was perform-3d on this variable because different 
types of libraries have widely different environments and widely 
divergent queries. The results from academic and special 
libraries were used in this analysis. The results from the one 
study done in a public library were not used. Significance was 
not achieved on this variable. This indicates that there was no 
difference between these two library typ^s when comparing the 
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efficiency of search systems. 

If the variance could not be explained by study date, effect 
size computation, publication type, or library type; and if it 
could only partially be explained by the search complexity, then 
how can we account for the considerable variance that did exist? 
There are at least two other possibilities. Unfortunately, 
analysis was not possible on these two variables. One 
possibility is the expertise of the searcher. It is known that 
end users and trained searchers produce very different results. 
Analysis was not possible on this variable because many of the 
researchers failed provide this information; and also because it 
was not uncommon for patrons to perform the manual search and 
librarians the online search. 

The second alternative variable for explaining the variance 
in study findings is the quality of the methodology employed in 
the original study. The studies analyzed in this meta-analysis 
employed many different methodologies. Some of these 
methodologies were unbiased and well validated (Edmonds, Moore & 
Balcome 1989). Others were not validated at all, and blatantly 
favored one system o*^ the other (Naber, 1985; Poynard & Conn, 
1985). It should be noted that the effect sizes extracted from 
these and other questionable studies are among the most extreme 
of all of those that are reported in the appendix. It follows 
that much of the variance in individual study findings could 
possibly be accounted for by this variable. Analysis was not 
performed on this variable because it was not possible to 
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quantify the quality of each study's methodology. 

Recommendations For Future Research 
One of the advantages of meta-analysis mentioned in the 
description of this methodology was that it provides an 
opportunity for evaluating primary research methodologies. The 
importance of this evaluation is illustrated above. The 
variability in method quality may have been the factor that had 
the greatest effect on the variability in the study results. 
This section of this meta-analysis will contain recommendations 
for future studies which compare information retrieval systems. 
This will be done through an analysis of the methods used in some 
of the research reviewed here. 
General Recommendations 

One of the basic principles of all social science research 
is that robust, generalizable results can be obtained only from 
studies that employ multiple subjects. Comparative research 
which is performed with a single subject can provide information 
oray on that one person. This should be obvious, and yet many of 
the^ studies analyzed for this review, including Miller (1968), 
Gill (1974), Santodonato (1976), Lang ley (1976) and Murphy 
(1983), were single subject studies. For this particular type of 
research it is also important to use multiple queries, since 
different questions may be more effectively answered in one 
system or the other. It would also be useful to perform separate 
analyses for questions of different complexity. There is 
evidence which suggests that the advantages of automation may be 
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better demonstrated by more complex questions (Havener, 1988; 
Smille, Nugent, Sander & Johnson, 1986). It is also important to 
randomly assign subjects to the experimental groups; or to have 
all of the subjects perform both searches. Most of the studies 
reviewed here which employed multiple subjects did adequately 
meet this requirement (Hartley, 1983; Havener, 1988). 

Another very important requirement for research in general 
is that the methods mus): provide a valid answer to the research 
question. All of the studies analyzed here had the same basic 
research question: do paper based and computerized information 
retrieval systems differ in their retrieval effectiveness? Some 
of the methods used were effective in answering this question, 
but others were not. Naber (1985) and Poynard and Conn (1985) 
provide examples of methodologies that do not answer this 
question. Naber used an existing printed bibliography on water 
harvesting as his control or manual search. He then performed 
multiple exhaustive searches in many different online files, and 
compared the total number of online relevant results to the 
existing bibliography. This was published as proof that the 
online system was superior. All it really proved, however, was 
that an exhaustive search could produce a few more results than 
the existing bibliography; no existing bibliography could 
possibly contain all of the relevant citations. The methodology 
used by Poynard and Conn (1985), on the other hand, was slanted 
in favor of manual systems. These researchers performed a single 
MEDLARS search as their online search, and for their manual 
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result searched the contents pages of all of the journals that 
were known to publish papers on their subject. This methodology 
did not provide an answer to the research question. It actually 
showed that bibliographic searching could not substitute for an 
extensive personal knowledge of the literature. 

The last general recommendation is that the two systems must 
be compared for similar levels of service. Some of these 
studies, for example Akeroyd and Rogers (1976) and Rogers (1983), 
used methodologies in which librarians performed the online 
searches, and students the manual searches. This is, of course, 
the situation that exists in many academic libraries. This 
methodology, however, can not be used to show that either system 
is superior, since the subjects in the two groups are not 
equivalent. Researchers that employ this methodology should not 
claim to be comparing retrieval systems, and their results can 
not be considered general izable to other environments. 
Recommendations for Determining Recall 

The number of relevant citations recalled is obviously the 
single most important dependent variable when determining 
retrieval effectiveness. All of the studies analyzed in this 
meta-analysis compared the two retrieval systems on this 
variable. Most of these researchers used recall ratios as their 
descriptive statistic. Recall ratio is defined as the number of 
relevant hits retrieved over the total number of relevant 
citations in the database. This statistic is widely used in 
information retrieval research. It is not, howeve", the most 
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efficient or the most accurate unit of analysis. There are many 
serious problems ^ith the computation of recall ratios. One is 
that 9 for most valid research methods, an extra and unnecessary 
computation is involved. Al 1 information retrieval research 
should be comparative} it' is impossible to compute the absolute 
effectiveness of any system. All comparative research should use 
equivalent groups} that is, there should be equivalent numbers of 
relevant citations in the systems that are being compared. The 
denominator in the equation should therefore be the same for all 
of the systems in any well executed research. There is no reason 
to divide every result by the same total number. The computation 
of recall ratios, however, is not merely a waste of time« The 
denominator of the equation introduces an enormous confound into 
the analysis. Lancaster (1977) states that it is virtually 
impossible to adequately compute the number of relevant citations 
in a database. Researchers are therefore forced to estimate this 
figure. Different researchers, however, will produce different 
estimates; and those that are less assiduous will identify fewer 
total relevant citations for the database. In the formula for 
computing recall ratio we can see that this smaller denominator 
will result in a more impressive recall ratio. Thus we have 
produced the worst of all research situations; research that is 
poorly done will produce more significant results. 

Recall ratios should therefore not be used in this type of 
research. Researchers should instead report the mean number of 
relevant citations produced by each system, along with standard 
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deviations. These statistics offer many advantages over recall 
ratios. They are easy to compute, and are universally recogni;sed 
parametric statistics that alloi^ for the computation of 
statistical significance. They also do not have the reliability 
problem that has been identified for recall ratios. 

The number of relevant citations produced is important, 
however, only when comparing searches that are exhaustive. Many 
actual searches are not meant to produce all of the relevant 
material. Searches of this type should be evaluated not by the 
number of relevant hits produced but by the time expended in 
producing each relevant citation. This will be discussed further 
in the section on time. 

Reiiommendations for Determinino Precision 

Precision ratios are a unit of little value. Lancaster 
(1977) identifies precision as a component of timej that is, the 
amount of time which the user needs to expend in determining the 
relevancy of the retrieved citations. This statistic is defined 
as the number of relevant citations produced over the total 
number of citations produced. Both of these figures are easily 
obtained. This statistic therefore does not have the serious 
problems that recall ratios have. However, it is still often 
poorly computed. Murphy (19BS), for example, reported precision 
ratios that were seriously flawed. This researcher performed 
complex searches in various databases. She computed the 
precision ratios for the online searches in the usual manner; 
that is, the number of relevant citations produced by her search 
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statement divided by the total number of citations produced by 
the search statement. The precision ratio for the manual search 
was computed differently. The denominator of the equation was 
instead the total number of citations produced by the individual 
parts of the search statement. This resulted in a very inflated 
denominator, and therefore a very small precision ratio. Other 
researchers reported precision data that was fl^.wed in the 
opposite direction (Elchesen, 1978). The subjects in this 
research project performed their own relevancy ^ ^agments as they 
performed the manual searches. All of the final results were 
therefore relevant. They therefore reported iOOX precision for 
the manual searches. The online searches, however, were treated 
differently. Relevancy judgments were not performed during the 
online searches. All of the final citations were therefore not 
necessarily relevant. The precision ratios were then computed in 
the usual manner. This resulted in precision ratios that were 
biased in favor of the manual systems. 

Since this confusion exists over the computation of 
precision ratios, it would probably be better not to compute 
them, but to include this component of the search results as part 
of the time variable, which will be discussed in the next 
section. However, if researchers feel that they have accurately 
determined precision ratios, and wish to include these ratios in 
their results, they should compute separate precision ratios for 
each subject and each question. They should then report the mean 
prfc^cision ratio for each system, along with standard deviatir^ns. 
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Statistical significance should a).so be reported* It is 
important to point out that researchers should not report 
precision ratios if they are also including this aspect of the 
search as part of the time variable. This would be including a 
single component tv^ice in the data analysis. 
Recommendations for Determining Time/Relevant Citation 

As indicated in the previous sections, time is always an 
important variable, and is in some circumstances the most 
important variable. It is generally recognized that time should 
be measured as the time necessary to produce each relevant 
citation. Most of the research studies reviewed measured time in 
this way. This does not mean that there are no problems with 
measuring time. The most important consideration when 
determining time is that all aspects of the time used need be 
included for each search. This is because manual and automated 
systems differ greatly in time consumption! different aspects of 
each take different amounts of time. Thus the total time for 
each subject should include the preparation time, the search 
time, and where precision is not computed, the time needed to 
identify the relevant citations. This total time should then be 
divided by the number of relevant citations produced by the 
subject. Researchers should then report the means and standard 
deviations for each system. 

Research into searches that are not exhaustive should have 
time per relevant citation as the primary variable. This can be 
accomplished by having searchers stop when 'chey have reached a 
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pre-determifit^d number of potentxally relevant citations. The 
actual number of relevant results should then be determined. 
Time should then be computed as indicated above , and the means 
and standard deviations for the time per relevant cite should be 
reported . 

Recommendations for Determining Co$^ts/Relevant Citation 

The cost of searches is also an important variable, and one 
that suffers from similar problems. East (1980), in an early 
review of cost comparisons, concluded that most studies did not 
directly compare the systems, but actually used crude estimations 
of the costs involved. Many researchers also did not include all 
aspects of costs in their analysis (Calkins, 1977; Elman, 1975; 
Huang & McHale, 1990). Lancaster (1977) identified 10 important 
aspects of cost analysis. These include start up costs, such as 
equipment and storage costs; and ongoing costs, such as materials 
and subscript, jns. Staff salaries should be included only if 
this component v^as not included in the analysis of time. It is 
also important to report standard deviations. Many research 
studies, such as Cohen and Young (1986); were disqualified from 
inclusion in this meta-analysis because they did not report 
standard deviations. 

Recommendations for Determining User Satisfaction 

It was not possible to include data on satisfaction in this 
meta-analysis because few studie«i could be located which 
performed adequate analysis on this variable. Most of the 
studies mentioned in the introduction evaluated user satisfaction 
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with the two systems by asking a single question similar to: 
which do you prefer, the online catalog or the card catalog? 
User satisfaction with library services can not be measured by 
asking a single question shortly af er a new and exciting service 
has been introduced. User satisfaction can be measured only with 
a multiple question assessment instrument that has been tested 
for reliability and validity. There is a pressing need in 
library research for such an instrument. Other guidelines for 
determining user satisfaction are provided by Lancaster (1977) 
and by Tessier, Crouch and Atherton (1977). 
Recommendations for Future Meta-anal vses 

This is, as far as this researcher has been able to 
determine, the first timt that meta-analytical procedures have 
been applied to library research. It is hoped that other 
researchers will see the value of this research procedure, and 
apply it to other research questions. There are surely other 
aspects of library research that could benefit from an unbiased 
statistical reviewj and from a critical analysis of the commonly 
used research methods. Future meta-analyses would also be? useful 
on this particular research question, especially if analysis of 
the variance was possible on the variables of level of service 
and quality of methodology. 

Conclusicins 

This meta-analysis has produced results that wiH. be of 
value to librarians, to online system developers, and to library 
researchers. The main conclusion for librari.ins is that there is 
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little difference in retrieval effectiveness between paper based 
and computerized information retrieval systems^ This is not to 
say that there are no differences between the two systems. Each 
offers uniaue advantages; and the wisest choice, if economically 
possible, would probably be to provide both services. The 
results of this research merely show that all statements which 
purport that either system provides general ly enhanced 
information retrieval are based on something other than 
established fact. 

Online system developers may be interested in the above 
conclusion. They should also note the lack of significance on 
the analysis which compared study publication date with effect 
size. Despite all claims to the contrary, there is no evidence 
that there has been any improvement in either the retrieval 
performance of online systems, or in the ability of searchers to 
effectively use these systems. 

The conclusions for library researchers are two fold. The 
first is that meta-analysis can, and should, be used in reviews 
of the library research. The second conclusion is that the 
research into information retrieval systems has sometimes 
employed methodologies and statistical procedures that are 
flawed. Recommendations are provided to correct some of these 
problems. Some may consider these recommendations too rigorous. 
It should be noted, however, that robust and meaningful research 
results are not possible without rigorous research procedures. 
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Appendix 
Individual Study R es ults 



Ohta 
Miller 

Virgo 

Bivans 



Gill 



Goodliffe 

Michaels 
Akeroyd & 
Ftogers 

Lang ley 



Pub. Lib. Search 
Year Type Type Type 



1967 J 

1968 J 

1970 J 

1974 J 



1974 E 



1974 

1975 
197^ 



1976 



J 
R 



A 
A 



A 



S 

S 
A 



Computation 
of es 



na 

conp. 
na 



Component Total 
es es 



computed 
estimated 

estimated 



comp. computed 



R +0.578 +0.578 
R -0.522 -1.276 
P -2.498 

R -0.340 -1.346 
P -2.351 

R -0.047 -0.215 
P -O.047 
T -0.513 

comp. estimated R +0.019 -0.049 

P -1.640 



T +0.967 
C +0.457 

R -0.851 -0.420 
T +O.009 

R -0.302 -0.302 
R -1.601 -0.540 
P -0.978 
T +0.949 

R -0.139 +0.618 
T +1.371 



comp. computed 

comp. estimated 
comp. compiited 



comp. computed 
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Individual Study Results (continued) 



Study 



Year 



Pub. Lib. Search 



Type Type Type 



Computation 
of es 



Component Total 



es 



es 



Santodorato 
Smit/1 



Elchesen 



Cksuke & 
Pease 
Hartley 

Kranich 
Murphy 



Naber 
Poynard & 
Conn 
Rogers 



1976 
1977 



1982 

1983 

1984 
1985 



1985 
1985 

1985 



J 
J 



1978 J 



E 
J 



J 
J 



S 

s 



A 
S 



s 
s 



simp, 
na 



computed 
estimated 



R -1.41 



na 

simp, 
comp. 



comp. 
comp. 

na 



estimated 



-1.410 



R -0.360 -0.185 
P -0.459 
T +0.091 

C -0.023 

comp. estimated R +0.360 -0.004 

P -0.737 
C +0.268 

simp. estimated R -0.971 -0.857 

C -0.750 



estimated R +0.466 -0.198 

P -0.863 
estimated R -1.035 -1.035 
computed R +0.076 +0.165 

T +0.487 

C -0.071 
estimated R +4.750 +4.750 
estimated R -1.900 -1.900 



R -3.358 +0.320 



T +1.981 



C +2.341 
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Individual Study Results (continugd) 



Pub. Lib. Search Cdmputation Component Total 



Studv 


Year 


TVDB 


Type 




of es 


es 


es 


Akaho, 


1986 


J 


S 


comp. 


estimated 


R +0.149 


+0.360 


Banclai & 












T +4.352 




Fujii 












C -3.419 




Frid 


1987 


C 




simp. 


computed 


R +0.087 


-0.231 














T -0.531 




Bernstein 


1988 


J 


S 


comp. 


computed 


R -1.570 


-1.570 


Havener 


1988 


D 


A 


both 


computed 


R +oa35 


-0.025 










both 




T -0.182 












simp. 




-■0.160 












comp. 




+•1.090 




Reese 


1988 


J 


A 


simp. 


estimated 


R -1.950 


-1.950 


Edmonds, 


1989 


E 


P 


simp. 


estimated 


R -2.831 


-2. 851 



Moore & 
Balcom 



Note, 

Rjb. Type (Publication Type)s J = Journal, E = ERIC, 

C - Conference, R = Report, 
D Dissertation « 
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Individual Study Results (continued) 
Lib. Type (Library Type): A = Academic, S = Special, 

P = Public. 



Cdmputation of C;^: 



comp«= computed, simp.=^ simple, 
na - not available. 



Component 



R = Recall, P = Precision, 
T = Time, C = Cost 
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